Generic classification system

ABSTRACT

A classification system including a training device and one or more classification device for classifying one or more vectors other than training vectors. The training device is for selecting which training classification algorithms best classifies a set of training vectors, and for finding a set of values, of parameters of a generic classification algorithm, that enable the generic classification algorithm to substantially emulate the selected training classification algorithm.

FIELD AND BACKGROUND OF THE INVENTION

The present invention relates to the classification of objects and patterns and, more particularly, to a classification system with a generic classification algorithm that is automatically optimized for classifying specific kinds of objects or patterns.

It often is useful to classify the members of a set of objects or patterns into two or more classes according to features of the objects or patterns. If the features have numerical values, it often is useful to define the classes by defining a “feature space” as a multidimensional space whose coordinate axes correspond to the features. The feature space is partitioned into portions corresponding to the classes. Each object's respective set of feature values is a vector in this feature space. An object is classified by determining which portion of the feature space its feature value vector lies in.

FIG. 1 is a simple example of a two-dimensional feature space 10 partitioned into two classes. The objects are people. The features are height and weight, so the coordinate axes of feature space 10 are a HEIGHT axis and a WEIGHT axis. Feature space 10 is partitioned into two classes (“obese” and “non-obese”) by a line 12. A person whose (weight-value, height-value) vector lies below and to the right of line 12 is classified as “obese”. A person whose (weight-value, height-value) vector lies above and to the right of line 12 is classified as “non-obese”.

Feature space 10 is a “binary” feature space with two classes. The number of classes into which a feature space is partitioned is application-specific. For example, it may be useful to partition a geographic feature space whose coordinate axis features are “latitude” and “altitude” into three climate classes: “hot”, “temperate” and “cold”.

Feature space 10 is a simple two-dimensional feature space that is easy to visualize and so easy to partition. In practical applications, feature spaces may have tens of feature coordinates. Such high-dimensional spaces are difficult or impossible to visualize. Therefore, classification algorithms have been developed for partitioning high-dimensional feature spaces into classes according to training sets of vectors. Examples of such algorithms include nearest neighbor algorithms, support vector machines and least squares algorithms. See Richard O. Duda et al., Pattern Classification (Wiley Interscience, 2000). For any specific application, a set of training vectors is chosen and is classified manually. The algorithm chosen for partitioning the feature space then selects the boundaries (e.g. hyperplanes) that partition the feature space into classes in accordance with the training vectors. Subsequently, given a new vector of feature values, the algorithm decides which class that vector belongs to. The X's in FIG. 1 represent six training vectors that could be used to train a classification algorithm for partitioning feature space 10. The three X's above and to the left of line 12 would be classified manually as “non-obese”. The three X's below and to the right of line 12 would be classified manually as “obese”. Line 12 is a boundary between the two classes, “obese” and “non-obese”, that would be selected by a least squares classification algorithm.

Note that the partitioning of the feature space need not be, and often is not, explicit. In other words, the algorithm chosen for partitioning the feature space actually operates by determining values of algorithm parameters in accordance with the manual classification of the training vectors. These parameter values define the partition boundaries implicitly, in the sense that, given a new vector of feature values, the algorithm decides on which side of the boundaries the new vector falls.

In higher-dimensional examples, the selection of the best classification algorithm to use for a specific application is a difficult task even for a specialist. So, for example, a manufacturer of digital cameras who desires to include in each camera a chip for advising the user of the camera about the quality (acceptable vs. non-acceptable, for example) of each photograph, would have to invest in an expensive research and development effort to select and optimize the appropriate classification algorithm. There is thus a widely recognized need for, and it would be highly advantageous to have, a system that could be trained by a non-specialist to perform near-optimum classifications for any particular application.

SUMMARY OF THE INVENTION

According to the present invention there is provided a classification system, including: (a) a training device for: (i) selecting which one of a plurality of training classification algorithms best classifies a set of training vectors, and (ii) finding a set of values, of parameters of a generic classification algorithm, that enable the generic classification algorithm to substantially emulate the selected training classification algorithm; and (b) at least one classification device for classifying at least one vector other than the training vectors, using the generic classification algorithm with the values.

According to the present invention there is provided a classification system, including: (a) a training device for selecting which one of a plurality of classification algorithms best classifies a set of training vectors; and (b) at least one classification device for classifying at least one vector other than the training vectors, using the selected classification algorithm.

The basic system of a first embodiment of the present invention includes two kinds of devices: a training device and one or more (preferably more than one) classification devices. The training device selects which one of a set of two or more training classification algorithms best classifies a set of training vectors, and then finds a set of values, of parameters of a generic classification algorithm, that enable the generic classification algorithm to substantially emulate the training classification algorithm that best classifies the set of training vectors. The classification device(s) use(s) the generic classification algorithm, parametrized with the values that the training device found, to classify other vectors.

Preferably, the classification device(s) is/are reversibly operationally connectable to the training device to receive the generic classification algorithm parameter values that the training device finds.

Preferably, the training device finds the generic classification algorithm parameter values by steps including resampling the feature space of the training vectors, thereby obtaining a set of resampling vectors, and then classifying the resampling vectors using the training classification algorithm that best classifies the set of training vectors. Most preferably, the resampling by the training device resamples the feature space more densely than does the set of training vectors.

Preferably, the training device has the option of dimensionally reducing the set of training vectors before selecting the training classification algorithm that best classifies the set of training vectors, and the classification device(s) also has/have the option to similarly dimensionally reduce the other vectors that it/they classifies/classify.

Preferably, the system also includes, for each classification device, a respective memory for storing the generic classification algorithm parameter values. Each memory is reversibly operationally connectable to the training device and to the memory's classification device.

Preferably, each classification device includes a mechanism for executing the generic classification algorithm. In various embodiments of the classification device, the mechanism includes a general purpose processor, and/or a nonvolatile memory for storing the generic classification algorithm program code, or a field programmable gate array, or an application-specific integrated circuit.

Preferably, the generic classification algorithm is a k-nearest-neighbors algorithm.

Preferably, the training device includes a nonvolatile memory for storing program code for effecting the selection of the best training classification algorithm and the finding of the corresponding parameters of the generic classification algorithm. Most preferably, at least a portion of such code is included in a dynamically linked library.

The basic system of a second embodiment of the present invention also includes a training device and one or more classification devices. The training device selects which one of a set of two or more classification algorithms best classifies a set of training vectors. The classification device(s) use the selected classification algorithm to classify other vectors.

Preferably, each classification device includes a mechanism for executing the selected classification algorithm and a memory for storing an indication of which one of the classification algorithms has been selected by the training device. Alternatively, each classification device itself does not include such a memory. Instead the system includes, for each classification device, a respective memory, for storing the indication of which one of the classification algorithms has been selected by the training device, that is reversibly operationally connectable to the training device and to the memory device's classification device. Under either alternative, most preferably the memory also is for storing at least one parameter of the classification algorithm that has been selected by the training device.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:

FIG. 1 is an example of a feature space with six training vectors;

FIG. 2 is a high-level block diagram of a system of the present invention;

FIG. 3 is the feature space of FIG. 1 with resampling vectors substituted for the training vectors;

FIG. 4 illustrates an alternative construction of the classification device of FIG. 2.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is of a system which can be trained by a non-specialist to classify objects for any specific application.

The principles and operation of a classification system according to the present invention may be better understood with reference to the drawings and the accompanying description.

Referring again to the drawings, FIG. 2 is a high-level block diagram of a system 20 of the present invention. System 20 includes two major components: a training device 30 and a classification device 40.

Training device 30 is represented functionally in FIG. 2, as a flow chart of the activities of training device 30. Structurally, training device 30 preferably is a general purpose computer that is programmed to implement the illustrated flow chart. In a non-volatile memory (e.g. hard disk) of the computer is stored a dynamically linked library for implementing the illustrated flow chart.

The input to training device 30 is a set 22 of training vectors that have been classified manually. In block 34, set 22 is used as input to several classification algorithms that are used independently, to determine which of these classification algorithms is the best algorithm to use in the application from which the training vectors of set 22 have been selected. These classification algorithms are called “training classification algorithms” herein. One way to determine the best algorithm to use is the “leave one out” method. Given a set 22 of N training vectors, each training classification algorithm is run N times on set 22, each time leaving out one of the vectors. The training classification algorithm that duplicates the manual classification of the largest number of “left out” vectors is selected in block 36 as the “best” training classification algorithm for the application at hand.

In block 38, values of parameters of a generic classification algorithm are selected that enable that generic classification algorithm to emulate the best training classification algorithm. One way to do this is to resample the feature space that is sampled by training set 22, to use the best training classification algorithm to classify the resampling vectors, and to use the resampling vectors thus classified to train the generic classification algorithm. Preferably, the resampling vectors are distributed randomly in the feature space. The resulting parameter values are output as a generic parameter set 24. FIG. 3 illustrates schematically what is involved in this kind of resampling. Specifically, FIG. 3 is FIG. 1 with the substitution of resampling vectors, represented by “+”s, for the six training vectors. The least squares classification algorithm that produced line 12 in response to the six training vectors would classify the “+”s above and to the left of line 12 as “non-obese” and the “+”s below and to the right of line 12 as “obese”.

Classification device 40 includes a non-volatile memory 42 for storing the generic parameter values of set 24 and a classifying mechanism 44 that uses the generic classification algorithm, as parameterized by the parameter values stored in memory 42, to classify any new feature vector that is presented to classifying mechanism 44. Classifying mechanism 44 may be implemented in hardware, firmware or software. For example, a preferred software implementation of classifying mechanism 44 includes a non-volatile memory for storing program code of the generic classification algorithm, a general purpose processor for executing the code, and a random access memory into which the program code instructions are loaded in order to be executed. A preferred hardware implementation of classifying mechanism 44 includes a field programmable gate array or an application-specific integrated circuit that is hardwired to implement the generic classification algorithm.

The preferred generic classification algorithm is a k-nearest-neighbors algorithm.

Some features of the vectors of training set 22 may be irrelevant to the classification of these vectors. For example, the color of a person's hair has no bearing on whether that person is obese. Including values of the feature “hair color” in the vectors of a training set for training a classification algorithm to partition feature space 10 would just introduce noise to the training process. It is obvious in this simple example not to include a “hair color” feature in an obesity training set; but in practical cases of higher dimensionality it is not obvious what features or combination of features to exclude from the training vectors. Therefore, optionally, before the training classification algorithms are executed in block 34, the dimensionality of the feature space described by training set 22 is reduced in block 32, using a procedure such as principle component analysis that culls irrelevant dimensions from the feature space. For example, given a three-dimensional set of vectors of values of the features “height”, “weight” and “hair color”, classified manually into the classes “obese” and “non-obese”, principle component analysis would determine that the “hair color” feature is irrelevant and reduce the dimensionality of the feature space of the set to two, as in FIG. 1. Because the generic classification algorithm parameters that then are selected in block 38 are adapted to a feature space of reduced dimensionality, a description of the results of the principal component analysis is included in generic parameter set 34 so that classifying mechanism 44 can project new feature vectors into the reduced dimensionality feature space before classifying the new feature vectors. This procedure is referred to in the appended claims as “dimensionally reducing” the training vector set and the new vector. This dimensional reduction, in addition to making classification device 40 more efficient, may also improve the quality of the algorithm training by training device 30 when training set 22 is relatively small.

Classification device 40 preferably is physically separate from training device 30 and is reversibly operationally connected to training device 30 only for the purpose of loading generic parameter set 24 into memory 42. For example, the manufacturer of digital cameras mentioned above trains the generic classification algorithm using training device 30 and an appropriate set 22 of training vectors, and then equips each one of the cameras with its own classification device 40, implemented e.g. as a set of integrated circuits in a multi-chip package, with the parameter values of generic parameter set 24 loaded in its memory 42.

FIG. 4 illustrates an alternative construction of classification device 40. Classification device 40 of FIG. 4 lacks a memory 42. Instead, memory 42 is included in a physically separate memory device 50. Classification device 40 and memory device 50 include respective interfaces 46 and 52 that enables memory 42 to be reversibly operationally connected to classification device 40. In this embodiment of the system of the present invention, training device 30 includes a similar interface to enable memory 42 to be reversibly operationally connected to training device 30. After training device 30 has determined generic parameter set 24, memory device 50 is operationally connected to training device 30 and generic parameter set 24 is loaded into memory 42. Then memory device is disconnected from training device 30 and is connected to classification device 40 as shown in FIG. 4.

That training device 30 is (preferably) a general purpose computer makes it easy to replace the training classification algorithms and the generic classification algorithm with improved algorithms. This replacement is most conveniently done by downloading the new algorithms from an external source such as the Internet.

That system 20 is self-contained and allows a user of system 20 to implement near-optimal classification without the assistance of specialists allows the user to maintain confidentiality of training set 22.

An alternative embodiment of the present invention lacks the generic classification algorithm. Instead, both training device 30 and classifying mechanism 44 share the same set of classification algorithms. Training device 30 selects the classification algorithm that best classifies training set 22, as above. Then, instead of selecting parameters for a generic classification algorithm, training device 30 prepares a bit string that indicates which of the classification algorithms is the best algorithm. This bit string is transferred to classification device 40, which therefore subsequently knows which of its classification algorithms to use to classify new feature vectors. Along with this bit string, training device 30 sends classification device 40 a set of parameters that defines for classification device 40 the feature space that the best algorithm determined. For example, if the best algorithm is a least squares algorithm then training device 30 sends classification device 40 the parameters of a hypersurface analogous to line 12 of FIG. 1. FIGS. 2 and 4, in addition to illustrating the first embodiment of the present invention, also serve to illustrate this alternative embodiment, with the understanding that block 38 is deleted and that “generic parameter set” 24 now includes the bit string that indicates to classification device 40 which classification algorithm to use and the parameters that define the feature space that was determined by the best algorithm.

As noted above, one of the strengths of the present invention is its ability to enable non-specialists to perform near-optimal classification of vectors in feature spaces of high dimension. There also are low-dimension cases that, because of their complexity, benefit from the present invention. Consider, for example, the following “verification” problem. Access to a facility must be restricted to authorized personnel. For that purpose, the facility is equipped with three biometric authentication devices. The first biometric authentication device measures the iris patterns of people who seek access to the facility. The second biometric authentication device measures the facial features of people who seek access to the facility. The third biometric authentication device measures fingerprints of people who seek access to the facility. Each biometric authentication device also reads an identity card of a person seeking access to the facility (which identity card, of course, must be an identity card of a person who is authorized to have access to the facility), compares its biometric measurement to a corresponding measurement in a database of such measurements made on people with authorized access, and produces a number representative of the probability that the person seeking access is the person identified by the identity card. The facility manager wants to combine the three biometric measurements in order to minimize false positives and false negatives. The present invention allows the facility manager to do this without being or hiring a classification specialist. In the context of the present invention, the probability produced by each biometric authentication device is a value of a corresponding feature in a three-dimensional feature space. The facility manager generates a training set 22 for system 20 by assembling a suitably large and varied population of people and by using the biometric authentication devices to make many measurements of respective biometric signatures of each member of the population. For each member of the population, one of these measurements is designated as a reference measurement, and the remaining measurements are transformed into corresponding training vectors by combining them with the reference measurement of that member of the population. These training vectors are classified as “access authorized”. Then the remaining measurements of that member of the population are transformed into another set of corresponding training vectors by combining them with the reference measurement of a different member of the population who is selected at random. These training vectors are classified as “access denied”. System 20 then is trained and implemented as described above.

The present invention, in addition to being useful to users who lack the expertise to develop classification algorithms that are optimized for their own specific applications, also is useful to users who do have such expertise. The present invention, by its generic nature, spares such a user the time and expense of developing and manufacturing a classification device 40 that is custom-tailored to that user's specific needs, even if the user is capable of doing so.

While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made. 

1. A classification system, comprising: (a) a training device for: (i) selecting which one of a plurality of training classification algorithms best classifies a set of training vectors, and (ii) finding a set of values, of parameters of a generic classification algorithm, that enable said generic classification algorithm to substantially emulate said selected training classification algorithm; and (b) at least one classification device for classifying at least one vector other than said training vectors, using said generic classification algorithm with said values.
 2. The classification system of claim 1, wherein each said at least one classification device is revesibly operationally connectable to said training device.
 3. The classification system of claim 1, wherein said training vectors sample a feature space, and wherein said finding is effected by steps including: (A) resampling said feature space, thereby obtaining a set of resampling vectors; and (B) classifying said resampling vectors using said training classification algorithm that best classifies said set of training vectors.
 4. The classification system of claim 3, wherein said resampling resamples said feature space more densely than said feature space is sampled by said training vectors.
 5. The classification system of claim 1, comprising a plurality of said classification devices.
 6. The classification system of claim 1, wherein said training device is further operative to dimensionally reduce said set of training vectors prior to said selecting of said training classification algorithm that best classifies said set of training vectors, and wherein each said at least one classification device is further operative to dimensionally reduce said at least one other vector in a like manner prior to said classifying of said at least one other vector.
 7. The classification system of claim 1, further comprising: (c) for each said classification device, a respective memory, for storing said values, that is reversibly operationally connectable to said training device and to said each classification device.
 8. The classification system of claim 1, wherein each said classification device includes a mechanism for executing said generic classification algorithm.
 9. The classification system of claim 8, wherein said mechanism includes a general purpose processor.
 10. The classification system of claim 8, wherein said mechanism includes a nonvolatile memory for storing program code of said generic classification algorithm.
 11. The classification system of claim 8, wherein said mechanism includes a field programmable gate array.
 12. The classification system of claim 8, wherein said mechanism includes an application-specific integrated circuit.
 13. The classification system of claim 1, wherein said generic classification algorithm is a k-nearest-neighbors algorithm.
 14. The classification system of claim 1, wherein said training device includes a nonvolatile memory for storing program code for effecting said selecting and said finding.
 15. The classification system of claim 14, wherein at least a portion of said program code is included in a dynamically linked library.
 16. A classification system, comprising: (a) a training device for selecting which one of a plurality of classification algorithms best classifies a set of training vectors; and (b) at least one classification device for classifying at least one vector other than said training vectors, using said selected classification algorithm.
 17. The classification system of claim 16, wherein each said at least one classification device includes: (i) a mechanism for executing said classification algorithms; and (ii) a memory for storing an indication of which one of said classification algorithms has been selected by said training device.
 18. The classification system of claim 17, wherein said memory is also for storing at least one parameter of said classification algorithm that has been selected by said training device.
 19. The classification system of claim 16, wherein each said at least one classification device includes a mechanism for executing said classification algorithms, and wherein the classification system further comprises: (c) for each said classification device, a respective memory, for storing an indication of which one of said classification algorithms has been selected by said training device, that is reversibly operationally connectable to said training device and to said each classification device.
 20. The classification system of claim 19, wherein said respective memory is also for storing at least one parameter of said classification algorithm that has been selected by said training device. 